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1.   INTRODUCTION 
Consider  a  finite  population  of  N  observations.   The  population 
is  dichotomous,  consisting  of  R  "favorable"  and  N-R  "unfavorable"  subjects. 
This  description  would  apply,  for  example,  to  a  finite  set  of  consumers, 
the  favorability  referring  to  their  attitudes  toward  a  new  brand  of  tooth- 
paste.  We  should  like  to  draw  a  sample  of  size  n  from  this  population, 
but  as  is  almost  always  the  case  in  social  research,  some  of  the  subjects 
will  not  respond  when  asked  whether  they  are  favorable  or  unfavorable  to- 
ward the  product.   This  leads  to  the  following  further  breakdown: 

Rl  =  Number  of  favorables  who  respond. 
R2  =  Number  of  favorables  who  do  not  respond. 
Si  =  Number  of  unfavorables  who  respond, 
N  -Rt -R2-S1  =  Number  of  unfavorables  who  do  not  respond. 

k  =  Number  of  respondents  in  sample, 
n-k  =  Number  of  non-respondents  in  sample. 

r  =  Number  of  favorable  respondents  in  sample, 
k-r  =  Number  of  unfavorable  respondents  in  sample. 

In  this  paper  we  shall  deal  with  the  problem  of  estimating 
R  =  R]^  +  R2,  the  total  number  of  favorables,  in  a  way  which  minimizes  a  loss 
function  based  on  the  error  of  estimation.   The  analysis  that  we  propose  is 
Bayesian  in  the  sense  that  we  assign  a  joint  prior  distribution  to  the 
parameters  Rj^,  R2,  and  S^,  (N  is  assumed  known),  and  using  the  likelihood 
of  the  sample  observations  via  the  application  of  Bayes '  theorem,  we  derive 
the  posterior  distribution  of  the  parameters.   In  addition,  we  shall  discuss 
follow-up  sampling  of  the  non-respondents  in  the  initial  sample  and  the 
determination  of  optimal  initial  and  follow-up  sample  sizes. 
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The  idea  for  this  work  came  from  a  paper  by  H.  V.  Roberts^ 
"Informative  Stopping  Rules  and  Inferences  About  Population  Size"  [  o  ],  in 
which  the  author  applies  Bayesian  estimation  procedures  to  the  capture- 
recapture  problem  in  Feller  [   '.'■      ,  p.  43],   In  estimating  the  size  of  a 
finite  population  as  a  function  of  the  number  of  tagged  objects  in  a  re- 
capture sample,  Roberts  observes  that  the  stopping  rule^  i.e.  the  rule  by 
which  the  sample  size  is  determined,  is  informative  in  the  sense  that  it 
carries  information  about  the  unknown  parameter.   In  the  problem  posed  in 
this  paper,  if  one  thinks  of  the  "real"  sample  size  as  the  number  of  re- 
spondents,k,  then  the  stopping  rule  that  determines  k  is  a  probabilistic 
function  of  the  total  number  of  respondents  in  the  population  and  is  there- 
fore informative  also.   After  much  of  the  mathematical  development  to  be 
presented  here  had  been  worked  out  we  became  aware  of  a  paper  by  W.  A. 
Ericson  [  :^  ]  in  which  the  same  general  problem  has  been  treated.   Ericson's 
v;ork,  however,  concerns  a  Normal  process  with  unknown  population  proportion 
fTof   respondents.   Calling  the  mean  of  the  respondents  m-|^,  and  the  mean  of 
the  non-respondents  m2,  the  problem  is  to  estimate  m  =77m-]^  +  (l-77)m2.   The 
approach  is  Bayesian  with  a  Bivariate  Normal  prior  on  (m]^,m2),  an  independ- 
ent Beta  prior  onTT,  and  a  quadratic  loss  function.   There  are  sufficient 
differences  between  the  applications  discussed  here  and  Ericson's  problem 
to  warrant  the  exposition  of  this  work,  but  it  is  certain  that  the  present 
development  would  not  have  followed  the  exact  path  that  it  has  were  it  not 
for  the  availability  of  Ericson's  paper. 

2.   FURTHER  NOTATION  AND  ASSUMPTIONS 
We  assume  that  the  n  observations  are  chosen  according  to  simple 
random  sampling.   Hence  the  likelihood  of  k  responses,  including  r  favorables 


is  given  by  the  generalized  hypergeometric  probability  function: 

P(r,kjn,Ri,R2,S^,N)  =  f  j^(r,k-r  (n,  R^,S^,N)  =1  r/  \  k-r/  V  n-k  / 


(2.1) 


We  can  write  (2.1)  in  the  form 


M    /Si)     /Rl+Sl^/N-Ri-Si) 
fh(r.k-rln.Ri.Si.N)  =  1  r  /  Vk-r/   .  V   k  /  ^  r.-k   /     (2.2) 

and  call  the  second  factor  the  stopping  probability  determining  k^  the 
number  of  respondents.   (See  the  remarks  above  on  motivation.)   This  point 
is  not  of  particular  importance  to  the  sequel. 

Note  that  in  the  sample  of  n  we  observe  only  three  subgroups  -- 
the  r  favorable  respondents,  the  k-r  unfavorable  respondents  and  the  n-k 
non-respondents.   Obviously,  we  cannot  observe  the  attitudes  toward  the 
product  of  the  non-respondents  in  the  sample,  although  the  model  specifies 
a  number  R2  of  favorables  in  the  non-response  portion  of  the  population. 
It  will  be  useful  in  later  discussion  to  refer  to  the  "ideal"  likelihood 
function,  i.e.  that  which  would  be  appropriate  if  somehow  we  could  read  the 
minds  of  the  n-k  non-respondents.   Calling  the  number  of  favorable  non- 
respondents,  q,  we  would  then  write 


^)^^l.)(:o(''-nia"^)  ,  c 


fh(r,k-r,q|n,Ri,R2,Si,N;  =  V  r  JKk-v/Vq  J  \      n-k-g /    .    (2.3) 


The  principal  goal  of  this  paper  is  to  show  that  although  we  are 
forced  to  work  with  an  "insufficient"  likelihood  shown  in  expression  (2.1) 
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instead  of  the  ideal  in  (2.3),  we  can  make  meaningful  estimates  of  R  as  a 
result  of  the  introduction  of  a  full  prior  probability  distribution  for 

3.   PRIOR  PROBABILITY  DISTRIBUTION 
The  prior  distribution  that  we  employ  is  a  trivariate  Dirichlet 
mixture  of  multinomial  probabilities: 

Define  f^j^CR^,  R2,  S^I  p^,P2,P3,N) 

=  NJ pRlpRZpSi.  .N-R1-R2-S1 

Ri:R2JS^:(N-Ri-R2-S^):    Pi   ^2   ^3    ^      ^   ^2   ^3^ 

PpP2.P3  ^0    .    p^4p2+P3^1  ^^'^^ 

R^,R2,S^,N^0,    Rj^+R2+S^<N 

This  is,  of  course,  the  multinomial  distribution  for  random  variables: 

Next   define   f  j)(p-|^,p2,P3l  r{,  r2,  s{,n ') 

1  fn   ")  ^1-1    i^?"l    s-i-l,,  >.n   -r-|-r2-si 

>/^"     ■^.       . 7 7 T^P,-^         Po^         Po-^  (I-P1-P0-P0)  ^         "^  -^ 


—  ^       ,.     -^       ,.     r-,^/(V^       . 7 7 77-    P.^"     P2^"     Pq^'      (I-P1-P2-P3 

J7'(r^)£;(r2)_E(spj7(n    -r^-r2-SiL)      1^3  i      Z      J 

(3.2) 


rpr2,s^,n   70    ,    r^+r2+s^<n 


Pi,  P2j  P3^0  >    Pl+P2+P3^1' 
This  function,  called  the  Dirichlet  density  and  discussed  in  [  /  ],  is  the 

multivariate  extension  of  the  Beta  probability  function.   For  our  prior  dis- 

1 
tribution  for  the  unknotvm  parameters  (R-|^,R2,S|^)  we  want 

P.(R^,R2,S^|r;,r2,s^',n',N)  =  ^^^^\,\,^^'^{,^2'^[y  ^^'^  (3.3) 

=  \  j    \  fm(^l^'^2.Sl|Pl.P2'P3'N)fD(Pl'P2^P3Ul''^2'.s{,n')dpidp2dp3 

Pl-*T>2+P3^1 


^  We  shall  follow  the  convention  of  using  a  single  prime  to  denote  prior  proba- 
bilities and  a  double  prime  for  posterior  functions.   When  needed,   a  triple 
prime  will  be  introduced  for  a  second  stage  posterior. 


.       (3.4) 


V+N-1 


We  call  this  mixture  the  trivariate  Dirichlet-multinomial  dis- 
tribution.  It  is  a  natural  conjugate  prior  for  the  ideal  likelihood 

9 
displayed  in  (2.3)  .   By  natural  conjugate  we  mean  that  with  this  prior 

and  with  the  ideal  (and  unattainable)  likelihood,  the  posterior  distribu- 
tion v;ould  be  of  the  same  family  of  distributions,  with  the  posterior 
parameters  given  by  simple  operations  upon  the  prior  parameters  and  the 
sufficient  sample  statistics.   (See  [  5  ],  Sec.  3.2) 

It  may  be  helpful  to  think  of  the  finite  population  of  known 
size  N  as  a  random  sample  from  a  super-process  according  to  which  the 
numbers  R, ,R„,  and  S  are  determined  by  the  multinomial  distribution  (3.1). 
If  we  then  assign  a  Dirichlet  prior-prior  (3.2)  to  the  unknown  multinomial 
parameters,  PijP2j  ^"^  PSj"  ^^^  expression  (3.4)  is  the  prior  marginal 
probability  for  the  "sample  outcomes",  (R]^,R2,S]^)  --  sometimes  called  the 
predictive  probability.   With  respect  to  the  analysis  involving  the  actual 
sample  statistics  (r,k-r,n-k)  this  marginal  distribution  serves  as  the 
prior  distribution  for  (Rj.,  S]^,  S2)  given  N. 

Note  that  the  trivariate  Dirichlet-multinomial  distribution  requires 
the  specification  of  four  prior  parameters,  r'  r^,  s'  and  n  .   In  the  following 
section  we  present  a  short  discussion  of  the  problem  of  choice  of  these 
parameters . 

4.   SPECIFICATION  OF  PRIOR  PARAMETERS 

If  we  view  the  population  as  a  sample  of  size  N  from  a 
multinomial  super-process,  as  suggested  in  the  previous  section,  then  our  problem 


2 

We  say  a  natural  conjugate  because  it  is  not  unique.   The  multinomial  distribu- 
tion in  (3.1),  for  example,  would  serve  also,  but  it  is  not  as  flexible  as  the 
Dirichlet  -  multinomial. 
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is  one  of  specifying  the  parameters  of  a  Dirichlet  distribution  for  the  para- 
meters Pi,Pt  and  p-,  where 

Pt  =  prior  probability  that  a  member  of  the  population  is 
a  favorable  respondent 

P2  =  prior  probability  of  favorable  non-respondent 

p„  =  prior  probability  of  unfavorable  respondent 

with  p^+p^+p^^  1. 

In  order  to  aid  our  assessment  we  can  take  advantage  of  the  follow- 
ing theorems,  proved  in  [  7  ^pp.  179-80]: 

THEOREM  4.1   If  (p   p^,,..^p. )  is  a  vector  random  variable  having  the  k-variate 
Dirichlet  distribution  with  parameters  {v^,r2,  . . .  ,T^y^',^   -r^-r2r . . . -rj^)  ,  then 
the  marginal  distribution  of  (p-,^,  . . .  ,Pj^  ),    k^<  k  ,  is  the  kj^-variate  Dirichlet 
distribution  with  parameters  (r£,  r£, . .  .^r^^  ;n  -''^i~''^2~ '  "~^k   ^* 

THEOREM  4.2   If  (p]^,  P2  j  • .  .P^  )  is  a  vector  random  variable  having  the  k-^ 
-variate  Dirichlet  distribution  with  parameters  (r-j^,  r2,  . . .  ,rj^  ;n  ~''^i~^2~' •  •~^'k-i'^  > 
then  the  sum  P]^+P2"*'' • '"'"Pk  ^^^  ^^^   Beta  distribution  with  parameters 
(r-[^+r2+.  ..+rj^  ;n  -^jl'*  •  •  "^k/ • 

THEOREM  4.3   If  (pj^,P2j  . . .  :,P;^  )  is  a  vector  random  variable  having  the  same 
Dirichlet  distribution  as  above,  then  conditional  on  Pi,P25  •  •  •  ^Pki-l^  ''^^  ratio 
Pl^  /(l-p^-.  .  .-Pk  i)    has  the  Beta  distribution  with  parameters 

,  f  f  f  /  y   V 

('^kii"  -'^l-'^2-----'^ki->- 

Rather  than  attempt  to  assess  all  parameters  in  one  stroke,  a  more 

natural  approach  to  the  problem  is  to  ask  oneself  first  to  assess  the  prior  dis- 
tribution of  the  probability  that  a  member  of  the  population  is  favorable  {^^^2^ 
in  our  notation).   The  next  question  is,  "Given  that  a  person  is  favorable,  what 


is  the  probability  that  he  will  respond?"  And  finally,  "Given  that  a  person 
is  unfavorable,  what  is  the  probability  of  response?" 

By  Theorem  4.1,  if  (p,,p„,p-)  is  to  be  Dirichlet  distributed, 
then  (p^,p„)  is  also  Dirichlet,  and  by  Theorem  4.2,  the  sum,  Pi+Po  i"ust  be 
Beta  distributed.   The  first  step,  then,  is  to  assess  a  Beta  distribution  for 
p,+P2,  specifying  parameters  r'  and  n'-r'.   Possible  aids  to  one's  subjective 
judgment  are  discussed  in  [  -.•  ,  ch.  11]. 

The  next  step  is  to  specify  the  distribution  of  pj^/  p2^H-p2, 
conditional  on  the  values  of  P]^+P2.   According  to  Theorem  4.3,  this  distribu- 
tion must  be  Beta  with  parameters  r-^  and  v'-x\.      In  other  words,  given  r'  from 
the  first  stage  of  the  assessment,  r/  must  be  chosen  to  be  less  than  r  ,  and 
we  implicitly  define  r'  =  r-|^+r2. 

The  final  step  involves  the  assessment  of  the  distribution  of 
p-j  /  (l-p-|^-p2),  the  probability  of  response,  conditional  on  an  unfavorable 
attitude.   The  distribution  must  be  Beta  with  parameters  s-j^  and  n  -r-|^-r2-s{; 
hence  given  n  -r   =  n  -r-|-r2,we  must  choose  s^^   n  -'^\-t^2* 

We  now  have  chosen  parameters  x^,xi,s^   and  n   in  a  manner  that 
assures  that  r{+r2+s{  <.  n '.   With  the  inclusion  of  the  known  population  size, 

N,  the  required  trivariate  Dirichlet-multinomial  distribution  for  (Rj^,R2,Sj^) 

3 

has  been  fully  specified. 


We  are  not  suggesting  that  the  assessment  of  the  prior  distribution  is  a 
simple  problem,  nor  that  other  approaches  cannot  lead  to  implied  parameters 
that  are  inconsistent  with  the  results  of  the  suggested  method.   It  is  not 
inconceivable,  however,  that  a  computer  could  be  programmed  to  compare  several 
alternative  assessments,  report  inconsistencies  and  violations  of  restrictions, 
and  present  visual  displays  of  the  implied  distributions  in  order  to  aid  the 
decision  maker.   Antagonists  may  indeed  refer  to  this  suggestion  as  a  deus  ex 
machina. 


5.   THE  JOINT  POSTE.IIOR  DISTllICUTION 
After  the  o'uservation  of  k  respondents  and  r  favorables  out  of  the 
sample  of  n,  we  apply  Bayes  '  theorem  and  o'utain  the  joint  posterior  distribu- 
tion of  (R  -r,S^-k+r,R^)  conditional  on  the  specification  of  r'  r'  s'  n', 
N  and  n,  and  the  observed  sample  statistics  r  and  k.   With  the  prior  mass 
function  (3.4)  and  the  likelihood  given  in  (2.1)  we  obtain 

P"  (Ri-r,S^-k+r,R2|r,k,n,r{,r2,s{,n',N) 
^■^  %,i('H^^^?..Si|r{,r2,s{,n',N)fh(r,k-r|n,R^,S3^,N).      (5.1) 

In  order  to  normalize  this  posterior  distribution  we  need  an  expression 
for  the  marginal  distribution  of  (r^k-r): 

N-n+r   N-n+k-R;^   N-Rj^-S;^ 
P'(r.k-rjr{,r2,s{,n',N,n)  =  ^    2^  2_j     fDm(P^l^R2.  Si)f  hC"^, -"^1  ",  r^,  S^,!;) 

Rl=r    Si=k-r    R2=0  (5  2) 

The  triple  summation  in  (5.2)  yields 

P'(r,k-r[  r{,r2',s{,n',N,n)  =  fQj^(r,k-rIr{,  s{,n ',n)  .    (5.3) 

In  other  v;ords,  if  we  assess  a  Dirichlet-mult inomial  prior  for 
(R-]^^  R2,  S^) ,  the  predictive  probability  distribution  for  the  statistic 
(r,k-r)  given  n  is  also  Dirichlet-multinomial . 

We  present  the  proof  of  (5.3)  because  it  involves  the  following  very 

useful  lemma,  one  v/hich  k.eeps  popping  up  in  subsequent  developments: 

LEI-D-IA  5.1^ 

k 

/'a-:-k-j-l\/b-l-j-l\  =  /"a-lb-l-k-l 


Z 


-J  /V  J   /    \  -    /   •  (5.4) 

j-0" 

Lemma  5.1  is  a  well  kno\m  problem  in  Feller's  Volume  I  [  -   ,  p,  62].   Du^ey 
[  1  ]  proves  it  in  four  V7ays  different  from  Feller's  suggested  method. 
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To  prove  (5.3)  we  write,  according  to  (5.2), 


N-n+r   N-n+k-Ri 


P  '( r ,  k-r  I  r  {,  r 2 ,  s  {,  n  ',  N,  n) 


rj^+Rpl  ^  /Si+S^-l 


'    ^1 
k-r 


.   n-k  / 


KY=X  SY=]!i-V 


<n'+N-l  \ 


il) 


•    •    •    y  . 


N-R^L-^l  /"^  -rj^-r2-S]^+N-Rj^-S]^-.R2-l  \  /r2+R2-l 

N-R1-S1-R2       J    y        R2 
^2=0  .    (5.5) 

Letting  n '-r-j^-r2-s-|^  in  the  last  summation  play  the  role  of  a,  r2 
the  role  of  b,  and  N-R-j^-Sj^  and  R2  the  roles  of  k  and  j,  respectively,  we 
invoke  Lemma  5.1  and  (5.5)  becomes 
N-n+r   N-n+k-Ri  /ri'+Rn-lN  /si'+Si -1  V  rA /s^  A /n-RtSA  /n '-ri'-sZ+N-Ri-S-i -1 


^1  r  "^l^^l" 


'i^^r 


Ml  ^    (       11 

r  /V  k-r  A   n-k 


1  -a-i  T-iN-rs.!  -'Ji  " 
N-R;L-S]^ 


R]^=r    S2^=k-r 


^n'+N-l\  /n\ 


(5.6) 

Explicit  evaluation  of  the  binomial  coefficients,  calcellation  of 
like  factors  in  numerator  and  denominator,  and  the  introduction  of  like 
factors  leads  to 

N-^'^   (r{+R^-l)  J  (n  '-1)  .'n.'  (N-n)  J  (s{+k-r-l)  .' (n '-r {-s{^-k-l)  .' 
^L_ ( r{-l)  J  (R^-r)  :r :  (n  '+N-1)  .'  (s{-l)  :  (n  '-r{.s{-n  .'  (k-jr)  .'  (n-k)  .' 
Rl=r 


N-n+r-Ri   A '.r^^.g^'+N-Ri-S^-l  "^  /  sf+S^-l 

V  N-n+r-R]^-S]^+k-r 
S].-k-Hr=0 


S^-k+r 


(5.7) 


Again  we  apply  Lemma  5.1  to  the  last  summation  and  find  that  it  equals 
roduction  of  the  factor  (r-+r-l).'  in  the  numerator  and 


n  -ri+N-Ri-l\ 

^     ^1.   Intrc 
N-n-Rj^+r  / 
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denominator  of  (5.7),  and  rearrangement  results  in 

(r{+r-l)  ;  (n  '-1)  Jn.'  (N-n)  .'  (s{+k-r-l)  .'  (n  '-r{-s{+n-k-l)  .' 
(r{-l)  .'r :  (n  '+N-1)  .'  (s{-l)  .'  (n  '-r{-s{-l)  .'  (k-r)  I  (n-k)  I 


N-n 


R3^-r=0 


n  -rj^+N-Rj^-1 

N-n-Rj^+r 

'   '        (5.8) 

A'+N-l  \ 
By  Lemma  5.1  again  the  above  summation  equals  \  N-n  J.    Cancelling, 

and  rearranging  we  finally  find  that  (5. 8_)  equals 

^f+r-l'^    /^s{+k-r-l  ]       /n'-r{-s{+n-k-lj 


^-^^   \ — =  fDn.(^.k-r  |r{,s{,n',n) 


n  ^-1^ 

V   y  (5.9) 

the  desired  result. 

It  is,  incidentally,  the  successive  application  of  Lemma  5.1  that 
enables  one  to  prove  that  expressions  (5.9)  and  (3.4)  sum  to  one  over  all 
values  of  the  appropriate  joint  random  variables. 

Summation  over  r  shows  that  the  predictive  probability  of  k  respond- 
ents is  Beta-binomial: 

P'(k|r{,r2,si,n',N,n)  =  f^^^(k|riV^lVn ',n)        (510) 

Note  that,  similar  to  the  results  of  Theorem  4.2,  the  Beta  para- 
meters in  (5,10)  are  r{+s{  and  n'.   In  fact,  using  Lemma  5.1  the  hierarchical 
properties  of  the  Dirichlet  distribution  discussed  in  [  7  ]  can  be  shown  to  hold 
for  the  Dirichlet-multinomial  mass  function.   That  is,  subsets  of  the  original 
variables,  sums,  and  conditional  ratios  are  all  lower  order  Dirichlet-multinomial 
distributed. 

Continuing  the  development  of  posterior  distributions,  we  divide  the 
right  hand  side  of  (5.1)  by  the  result  in  (5.9)  and  obtain 
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P"(Rl-r,R2,S^-k+r|r,k,n,r{,r2,s{,n',N) 


r 


n-k 


n  -ri -s, -1-n-k-l 
n-k 


(5.11)   -J 

It  is  interesting  to  compare  the  joint  posterior  mass  function  in  (5.11) 
with  that  which  would  result  if  we  were  able  to  apply  the  ideal  likelihood  (2.3)  to 
the  prior  shown  in  (3.4).   One  can  straightforwardly  show  that  in  the  ideal  case 
P"  (R^-r,S^-k+r,R2-q|r,k,q,n,r{,r2,s{,n',N) 


'r^'+R^-l\  /r2+R2-M /s^'+S^-l\  /n'-rj'-r2-s^'+N-R^-R2-S^-l 
.  Rl-r  j 


R2-q 


S-j^-k+r/  V  N-R]^-R2-S]^-n+k+q 


n  +N-1 
N-n 


(5.12) 


where  q  is  the  somehow  observable  number  of  favorable  non-respondents  in  the  sample 

Remark  that  the  expression  in  (5.12)  is  of  the  same  form  as  the  prior 
mass  function  (3.4)  and  is  therefore  Dirichlet-multinomial .   The  posterior  para- 
meters are 

^1  =  ^1+^ 
r2  =  r2-l-q 
s-   =  s,+k-r 

n   =  n  +n  . 

We  see  that  posterior  parameters  are  obtained  simply  by  adding  the 
sufficient  sample  statistics  (rjk,q,n)  to  the  prior  parameters. 

The  left  hand  factor  in  (5.11)  is  similar  to  (5.12)  with  respect  to 
the  first  three  binomial  coefficients  of  the  numerator  and  the  term  in  the 
denominator,  but  there  the  similarity  ends.   The  fourth  binomial  coefficient 
does  not  involve  the  sample  non-respondents  as  in  (5.12).   Instead,  the  whole 
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expression  is  weighted  by  an  adjustment  factor  reflecting  the  uncertainty 
concerning  the  composition  of  the  remaining  n-k  observations. 

The  next  step  is  to  examine  the  lower  order  joint  and  marginal 
distributions  implied  by  (5.11). 

One  might  expect  that  since  the  sufficient  statistics  r  and  k-r 
are  available  for  favorable  and  unfavorable  respondents,  respectively,  the 
posterior  distribution  for  Ri-r  and  S^^-k+r  would  be  the  same  as  in  the  ideal 
case;  and  it  can  indeed  be  shown  by  summing  over  R2  that 

P"(R^-r,S^-k+r)  =  f  j^(R^-r,  S^^-k+r  (  r^'+r,  s^'+k-r,n '+n,N-n)  . 

(5.13) 

It  follows  that  Rj^-r  and  Sj^-k+r  are  each  marginally  Beta-binomial 

distributed  (see  [  5   ,  pp.  237-38]  for  details  of  this  distribution);  and 

in  particular,  one  can  show  that 

E"(R,-r)  =  (N-n)lill  .  (5.14) 

n  +n 

which  will  be  of  use  in  subsequent  developments. 

In  order  to  derive  the  marginal  posterior  distribution  for  R2,  the 

number  of  favorable  non-respondents,  we  first  require  the  following  lemma: 

LEMMA  5.2   Let  x^P]  =  x(x-l) (x-2) . . . (x-p+1) ,  defined  for  all  real  x  and 

positive  integers.   We  further  specify 

0  when  p  <  0  or  p  p>  x 

1  when  p  =  0 

Given  parameters   a,b  >0  with  a  <.  b,    and  x  and   c   positive    integers 
with  X  <i  c. 


xtp]    =) 


c 


/a+x-lXA+c-a-x-lN  ,       /b+c-1  \ 

x=0  ^  (5.15) 
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Proof:   We  note  that  the  left  hand  side  of  (5.15)  can  be  written  as 


c 

a+x-1  \  /b+c-a-x-1 


■^-— 1  /a+x-i  Wb+c-a-x-i  \ 


x-p  /  \    c-x 

x=0 


c 
^--^  /a+x-l  A /o-a+c-p-x+p-l 
(a-H,-l)[p]  >         (  )  (5.17) 

— '\^"Py\   c-p-x+p 

x=0 


^t:) 


=  (a+p-l)'-^-'  I  ^   y  by  Lemma  5.1  . 


We  remark  in  passing  that  with  Lemma  5,2  one  can  show  immediately 
that  if  a  random  variable  r  is  distributed  according  to  the  Beta-binomial 
distribution  with  parameters  (r'^n'^n),  then  the  pill  factorial  moment  of  r 


is  given  by 


/n'+n-A 

:(r[p])  =  (r'+p-l)Cp]  I  n-p  / 


^  "^'1'  (5.18) 


n 


Now,  we  first  must  sum  over  expression  (5.11),  the  joint  posterior 

for  (R. -r,R2,S  -k+r),  with  respect  to  the  variable  S,-k+r.  Letting 

/n'+N-lA  (n'-r{-s{+Ti-k-l);  (5.19) 

V  N-n  / 


and  collecting  terms  involving  S, ,  we  write 


N-R^-R2-k+r 


P"(R2.Ri-r)  =  K     ^>       (N.R^_s^)t-^V^l'+Sl-l  V"  -l-2-s{+N-Ri-^2-Sl-l 


S2^-k+r=0 


S2^-k+ry\^      N-R]^-R2-S2^ 

(5.20) 
■13- 


N-R-[-.R2-k+r 

^r^  [n-k]/^l'+Si-l\/n'-r{-r2-s{+N-Ri-R2-Si-l 

=   K  ^  (R2+N-R^-R2-S^) 

^ '  ySj^-k+r/y  N-Rj^-R2-Sj^ 

Si-k+r=0  '(5^21) 


N-Rj^-R2-k+r      n-k 


X  •  •  •  . 


n-k\       ,    ,  .  , /s^+Sj^-1  Wn   -r]^-r2-Sj^+N-Rj^-R2-S^-l' 


=K         >  >  /   Rp^(N-Ri-R2-Si)t"-^-Jj 

^— 1         X       A    J  /  V^l"^"*"^/ V  N-Rj^-R2-S2^ 

Sj^-k+r=0  j=C 

(5.22) 


n-k  N-R-|-R2-k+r 

/s{+Sj^-l  \ /n'-r{-r2-s{+N-Rj^-R2-Sj^-l 


=K  V  r  j  rj j  J    "^  (N-R^-R2-si)  ['^-k- j  J  r^ 


Sj^-k+r  yV  N-R^-R2-S]^ 

j=0  S]^-k+r=0 

(5.23) 


Now,  since  summation  of  S-|^-k+r  from  0  to  N-R-[^-R2-k+r  is  equivalent  to  summation 
of  N-R, -R2-S,  from  k-r  to  N-R,-R2-k+r,  we  can  apply  Lemma  (5.2)  to  the  second 
summation  in  (5.23)  and  write 

n-k  . 

^C"^-k\  r  n  r   ,   ^/n'-r;-r;+N-R  -R  -1 

P''  (R,,R  -r)  =  K  >      R  JJCn'-r'-r-'-s^-k-j-Dt'^-l^-j]     ^   ^     ^   ^ 
^   ^        ^-—\i  I    ^  ^      ^      ^  \      N-R^-R2-n+r+j 

(5.24) 
The  next  step  is  to  define  a  new  constant 


-Vfc9=-: 


r2+R2-l 


R2   /    (n  -r^-s-j^-l): 


n'+N-l  \(n  -r{-s{4i^-k-l).'     '  (5^25) 

collect  terms  involving  R^  -r,  and  sum  expression  (5.24)  over  the  range  of  that 
variable.   We  write 


■14- 


n-k 


P"(R2)  =  G 


\-k\ 


Rp^(n'-r;-r2'-s;4^-k-j-l)t"-^-J 


j=0 


r^+R^-1 


R, -r 


^n'-ri'-ro+N-Ri-Ro-l> 


^N-R2-n+r+j-Ri   yj  (526) 

Since  R-j^-r /*  N-R2-n+j  implies  that  the  last  binomial  coefficient  in  (5.26)  is 
equal  to  zero^  and  since  n-j  ^  k^  we  can  substitute  N-R2-n+j  for  the  upper 
limit  of  the  range  in  the  second  summation. 

Application  of  Lemma  5.1  to  the  summation  above  then  leads  to. 


n-k 


j=0 


"■'M   [j]   ,   .   ,   .         [n-k-j]P  -r2+N-R2-l 
J  R2   (n  -rj^-r2-Sj^+n-k-j-l)        I 

'^   j  /  V  N-R2-n+j 


(5.27) 


Writing  out  G  explicitly  and  rearranging^  we  can  write  (5.27)  in  the  following 

form: 

P"(R2|  r,k,n,r^',r2,s^',n',N) 


n-k 


j=0 


•     y     /     •  , 


■2+J"l)  (^   -r]^-S;L"^2+""^'^"J"^ 

n  '-r-^'-s-j'+n-k-l 
n-k 


'r2'+R2-A  /n'-r2'+N-R2-l 
^^2-i     /    \    N-n-R2+j  J 


Ai  +N-1 
N-n 


(5.28 


Now,  if  we  call  the  first  factor  in  (5.28)  w j ,  and  if  we  recognize  the  second 
factor  as  the  Beta-binomial  mass  function,  f^^(R2-j |r2+j,n  +n,N-n), 
we  can  write  (5.28)  as 


n-k 


P"( 


R2|r,k,n,r{,r2,s{,n',N)  =  X  ,  W:if^b(R2- j  |  r2+j,n  ^+n,N-n), 


(5.29) 


j=0 
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with  the  following  interpretation: 

If  the  number  of  favorable  non-respondents  in  the  sample  were  known 
to  be  some  fixed  value  of  j  ^  n-k,  call  it  q,  then  through  the  ideal  Dirichlet- 
multinomial  posterior  (5.12)^  we  would  find  the  posterior  marginal  distribution 
of  R2-q  to  be  Beta-binomial, 

P"(R2-q)  =  fab(^2"'5/^2"'^j"''^^N"")*  (5.30) 

But  the  exact  value  of  j  is  not  known,  and  therefore  the  actual 
marginal  posterior  mass  function  for  R2  in  (5.29)  is  a  weighted  average  of 
Beta-binomial  mass  functions.   Furthermore  the  weights,  wj,..are  themselves 
Beta-binomial  probabilities.   Each  W::  can  be  interpreted  as  the  prior  pre- 
dictive probability  of  j  favorable  non-respondents  in  the  sample,  given  n-k 
non-respondents  in  all. 

The  proof  that  expression  (5.27)  can  be  summed  to  one  over  the  range 
of  R2  and  is  therefore  a  proper  mass  function  is  quickly  achieved  by  the 
application  of  Lemma  5.2. 

6.   FIRST  STAGE  LOSS  FUNCTION  AND  OPTIMAL  ESTIMATION 
Our  aim  is  to  estimate  R=R2^+R2.   We  shall  call  the  chosen 

A 

estimator  R. 

A 

Associated  with  R  and  each  possible  "true"  value  of  R=R-]^+R2  is 
the  value  of  a  loss  function  based  on  the  error  of  estimation.   Including 
as  arguments  of  this  function  the  sample  size  n,  the  statistics  r  and  k, 
as  well  as  R  and  R,  we  assume  that  the  loss  is  quadratic  in  form: 

L(R,R,r,k,n)  =  k^(R-R)  +K^+k^n  (6^1) 

where  Kg  is  the  fixed  cost  of  initiating  the  sampling,  kg  is  the  constant 
incremental  cost  of  sampling,  and  kj.  is  a  constant  expressing  the  trade-off 
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between  squared  estimation  error  and  the  cost  of  sampling. 

With  a  given  n,  our  decision  procedure  is  to  choose  R  in  order  to 
minimize  the  expected  posterior  loss,  i.e., 

min  E'^L(R,R,r,k,n)  =  ktE   (R-R  )  +Ks+ksn;  (6.2) 

t 

A* 

and  as  is  well  known  [   3  ,    sec.  6.3],  the  optimal  estimator  R  is  the 


posterior  expected  value  of  R: 


R*   =  E"(Ri+R2)  (6.3) 


The  minimum  expected  loss  conditional  on  n  is  thus  given  by 

min  E'*"L(R,R,r,k,n)  =  k^  Var*'(R)+Kg+kgn  . 
R  (6.4) 

The  posterior  expected  value  of  Rj^-r  is  displayed  in  (5.14). 

The  next  step  is  to  evaluate 

N-k 
E"(R2)  =  S  ^2  J^(R2(^.k,n,r{,r2,s{,n',N).       (6.5) 

R2=0 
Substituting  from  expression  (5.27)  with  G  defined  in  (5.25), 
we  write 

n-k  fN-k       .    /r2'+R2-l 

j=0 


n-k 

(R2)  =  G  ^r  J(n'-r{-r2'-s{+n-k-j-l)f"-'^-J^' 


R2 


n'-r2+N-R2-l 
N-n+j-R2   /I  (6.6) 

j=0  R2-l=0       V      /  ^ 

(6.7) 
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No 


P 

te  that  (X  +  D^P^  =  V/^^\[k]x[P-k]  =  x^P]  +  px^P"!]  ; 

k=0 


furthermore  the  last  binomial  coefficient  is  zero  ifR2  -  l^N-n  +  j  -  1; 
hence  we  can  write  (6.7)  as 
n-k 

j=0 


N-n-l-j  - 1 


R2-l=0 


^  UR2-I)    +j(R2-l) 


[j-1]  vri*^2-i  Y'^;-^i+N-R2-i 

R2-I   A  N-n+j-l-R2+l. 


(6.8) 


n-k 


=  G 


j=0 


(n-k^ 
J 


N-n-iy  V  N-n 


(n'-r{-r2-s{+n-k-j-l)''""   '' '^y^^^^T'-^iV  ]  |+j(r/+j-l) 


(6.9) 


by  a  double  application  of  Lemma  5.2  in  the  last  summation. 
We  rearrange  terms  to  get 


n-k 


G.  r  '(n-l^: 


rn  '-r-^-r2-s-['+n-k-j-l\  /  r2+l+j-l 


n-k- j 


IV 


+ 


n  +N-1 


N-n 


n-k 


''n  '-r '-r2'-s,''+n-k-j-l\  /  r,'+j-l^ 


n-k-j 


j=0 


(6.10) 
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n'+N-l^  /n'-r{-s{-fn-k\     /n '+N-1  \  /n '-r  {-s{+n-k-l 
G..r2'(n-k):{l  /  + 


N-n-1/  \     n-k     /      \  N-n   /  \    n-k-1     /  ;  /^    -^y) 


by  application  of  Lemma  5.1  in  each  summation.   Finally,  evaluation  of  the 
binomial  coefficients  and  cancellation  enables  us  to  write 

Tf"/p  N  _  ^//  (N-n)(n'-r{-s{4n-k)       (n-k) 

"■^  (n  -ri-si)(n  -ha)    (n  -r^-s^) 

Further  consideration  of  the  ideal  and  unattainable  case  involving 
the  posterior  mass  function  shown  in  (5.12)  enables  us  to  give  an  interesting 
interpretation  of  (6.12).   If  the  number  of  favorable  non-respondents  q  were 
known,  then  the  posterior  marginal  expectation  of  R2-q:,  similar  to  (5.12), 
would  be 


Er'(R2-q)  =  (N-n)  f2+^ 

n'+n  (6.13) 


or         E"(R;)  =  (N-n)  ''^"^  +  q 

n'+n  (6.14) 

Let  us  now  set  the  actual  expression  for  E"(R2)  in  (6.12)  equal  to  the  ideal 
result  in  (6.14)  and  solve  for  q.   We  find  th^t  :(6.;12),£an^,be; rearranged:  in  the 
form 


Er(R.)  =   (N-n)  r  .  ^  in^r  (n-k)r2; 
(n'4Ti)  L  '   ^2+^2   J   -rr  ^st 


2   2   -   .->.2  "52 


L'^L  (6.15) 

.      •     f      ^      f      / 

where  S2  =  n  -ri-r2-Sj^. 

(n-k)r9  ,    .   ,  . 

Hence  the  term  ■, — 7=-  plays  the  role  of  q.   But  what  is  this  quantity. 

r2+S2 

(      \t\     ' 

, ^  '    It  is  the  number  of  non-respondents  in  the  sample  multiplied  by 

r2+S2 
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the  prior  Beta  expectation  of  the  probability  of  a  favorable  observation, 
conditional  on  non-response.   (6.15)  says,  then. that  if  one  wants  Ef'(R2)  in 
the  face  of  non-response,  he  should  act  as  if  he  had  observed  the  proportion 
of  favorable  non-respondents  implied  by  his  prior  distribution  and  use  the 
result  of  the  ideal  case.'-' 

With  the  posterior  expectations  of  R-^-v   and  R2  we  have  the  basic  in- 
gredients  of  R  ,  the  optimal  estimator,  and  the  first  stage  decision  problem 
given  n,  is  complete.   If  one  wishes  to  evaluate  the  expected  loss  for  any  par- 
ticular problem,  as  given  by  (6.4),  an  expression  for  Var  (R-|^-l-R2)  is  required. 
Var'*(R,)  is  obtained  directly  from  the  formula  for  the  Beta-binomial  distribu- 
tion [   5  ,  p.  237],  and  Var"(R2)  and  Cov'(R-|^,R2)  are  obtained  from  rather 
tedious  but  straightforward  algebra.   (We  solved  the  problem  by  finding  the 
second  factorial  moment  of  R2  and  E"(R^-r,R2)  by  means  of  Lemma  5.2.,   The 
result,  however,  is  long,  and  we  shall  not  present  it  here,  although  it  lends 
itself  directly  to  computer  programming  for  evaluation.)  The  use  of  a  well  known 
trick  enables  us  to  avoid  the  need  for  Var'''(Ri+R2)  in  the  determination  of  the 
optimal  sample  size. 

7.   OPTIMAL  SAMPLING  WITHOUT  FOLLOW-UP 

If  we  intend  to  make  a  terminal  estimate  of  R  after  the  first  sample 

of  k  respondents  has  been  interrogated,  then  the  optimal  sample  size  is  deter- 

mined  by  choosing  n   to  minimize  the  prior  expected  value  of  the  posterior 

expected  loss;   i.e.,  /^ 

min  E  '  min  E'''1(R,R,  r,k,n) 
n     '^ 

=  min  k^  E 'Var"(R)+k_+k-n, 

n  (7.1) 

%Iote  that  with  the  multivariate  extension  of  a  rectangular  prior  mass 
function,  i.e.  r{'=r2=s{=l,  and  n'=4,  (6.15)  implies  that  one  should  consider 
half  of  the  non-respondents  to  be  favorable  --  a  kind  of  balance  that  is  not 
surprising,  since  the  posterior  for  R2  is  highly  dependent  on  prior  assumptions. 
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by  substitution  from  (6.4). 

In  order  to  avoid  direct  formulation  of  Var(R)  and  subsequent 
application  of  the  prior  expectation  operator^  we  use  the  relation 
[  5  ,  p.  106] 

E'Var"(R)  =  Var'(R)  -  Var 'liT'CR)  .  (7.2) 

Working  on  the  second  term  on  the  right  hand  side  above,  we  combine  ex- 
pressions (5.14)  and  (6.12)  and  write 

Er'(R)  =  (N-n)(r{+r)  +  T  +  r2(N-n)  (n '-r{-s{+n-k)+r2(n '+n)  (n-k") 

(n'+n)  (n'+n)(n'-r{-s{)  (7.3) 

where  the  tilde  'VJ'  reminds  the  reader  that  the  statistics  r  and  k  are  Lhe 
relevant  random  variables  at  the  prior  position  in  the  analysis.   It  is 
therefore  the  variances  and  covariance  of  r  and  k  that  induce  the  prior 
variance  of  E"(R).   Rearranging  and  omitting  terms  that  do  not  involve  r 
and  kj    we  operate  on  (7.3)  to  obtain 


Var'E"(R)  =  (N+nQ^  L^^   ^  rf  ^^^ .  ^     _  Irj  Cov'(?,k)'l 

(n  ^)'  L         (n'-r^'-s')^  (n^-r;-s;)  J 


(7.4) 
Since  the  predictive  distribution  of  r  and  k-r  is  Dirichlet- 
multinomial  (5.3)  _,  with  parameters  r^,    s^,    and  n ',  one  can  verify  that 


Var  (r)  =  n(n-hri  )r2^(n  -r^^) 

^TTT— —  ,  (7.5) 

n   (n  +1) 


Var^(k)  =  n(n-HiO(r{+6{)(n^-r{-s{)   ^^^^       ^^^^^ 
n'2(n'+l) 
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Cov'(r,k)    =  n(n+n  Or^Cn '-r {^-sp 

n'^(n'+l)  (7.7) 

Substitution  of    (7.5),    (7.6)   and    (7.7)    into    (7.4)   yields 

Var'E"(R)   =  N<N+nV rr{(n'-r{)   +  rz^rf+sf)    -   2r{r{J     . 

(n+n ')n '"(n '+!)  ("n  -ri-s,) 

(7.8) 

Now,  using  (7.2),  our  prior  expected  posterior  expected  loss  in  (7.1)  becomes 

kj.Var'(R)  -  k^Var 'Ef''(R)  +  Kg  +  kgn.  (79) 

The  first  term  above,  k^Var'(R),  is  merely  a  constant  times  the 
prior  variance  of  the  Beta-binomial  distributed  sum  R,  +  R^,  and  is  not 
a  function  of  n,  the  sample  size;  i.e., 

Var'(R)  =  N(N-H.-)^^1^^2)("  '^l-^z) 

n'2(n'+l)  (7.10) 

In  minimizing    (7.9),  then,  we  need  only  concern  ourselves  with  expression 
(7.8)  with  a  negative  sign  in  front  of  it.   In  order  to  solve  for  the 
optimal  n  we  differentiate 

kgn  -  k^Var 'Er'(R)  ,  (7.11) 

set  the  result  equal  to  zero,  and  obtain 


"  -      f^^""   '  (7.12) 

where 


^  =  J^^   [r{(n'-r{)+r|(r{+s{)  .2  '.^'l 
n'(n'+l)  ^  (n^-r^si^)     ^  '^ 


(7.12) 
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With  formula  (7.12)^  and  the  necessary  parametric  inputs  and  costs, 

we  calculate  the  optimal  sample  size  n  ,    along  with  the  prior  expected  loss 

of  the  implied  experiment  given  by  (7.1).   Before  we  decide  to  sample,  however, 

we  must  compare  (7.1)  with  the  expected  loss  of  using  the  prior  expectation, 

E'(R^+R„),  to  estimate  R  without  sampling.   With  n  =  0,  (7.8)  is,  of  course, 

zero  and 

£'(1083  of  estimation  without  sampling) 

=  kj.Var'(R).  (7.14) 

Hence,  if  (7.14)  is  greater  than  (7.1),  we  choose  the  experiment 
"Take  sample  of  n  given  by  (7.12)". 

We  remark  that  the  relationship  in  (7.2)  applies  to  any  decision 
problem  involving  distributions  for  which  the  required  expectations  and 
variances  exist;  and  it  is  usually  the  case  that  as  the  sample  size  n  approaches 
its  limit,  Var^E'^tR)  approaches  Var'(R),  i.e.,  the  preposterior  variance 
approaches  the  prior  variance,  and  E 'Var"'(R)  approaches  zero. --which  makes 
sense,  since  with  a  large  enough  sample,  we  expect  little  variation  in  our 
posterior  distribution.   [See   5  ,  p.  106]. 

In  this  problem,  however,  the  limit  of  (7.8)  as  n  approaches  N  is 
not  equal  to  (7.10).   The  reader  can  verify  that 

lim  Var  'e"'(R)  <  Var  '(R)  . 

ir^N  (7.15) 

Even  when  the  population  is  exhausted  by  the  sample,  we  cannot  es- 
timate R  with  total  precision  if  the  non-respondents  will  not  talk. 

8.   FOLLOW-UP  SAMPLING 
One  way  to  get  the  non-respondents  to  talk  is  to  sample  again  from 
the  n-k  who  did  not  respond  in  the  first  experiment.   We  wish  to  randomly 
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select  m  observations,  where  m  can  range  from  0  to  n-k,  and  by  means  of  a 
follow-up  survey  observe  the  number  q  of  favorable  attitudes.   We  assume  that 
in  the  second  stage  sample  of  m  there  is  total  response. 

The  likelihood  of  q  favorables  in  the  second  sample  of  m  is 
given  by 

P(q|ra,R2,N-Ri-Si)  =  f ^^Cqj  m,R2,N-Ri-S^) 

/^R2  '\  /'n-Ri-Si-R2\ 


N-R  -S  \ 

m^   7  (8.1) 


According  to  Bayes '  theorem 

5^  (Rl-r,R2-q,Si-k+r)  OC   I^  (R^-r,  R2, Si-k+r)f ^Cq] m,  R2,N-R^-S^  )  , 

(8.2) 

where  the  triple  prime  denotes  "second-stage  posterior". 

The  first  factor  on  the  right  hand  side  of  (8.2)  is  the  joint  first- 
stage  posterior  given  in  (5.11).   In  the  second  stage  of  the  analysis  (5.11) 
plays  the  role  of  a  prior  distribution,  and  we  obtain  the  second  stage 
posterior  in  the  usual  way.   The  expressions  in  the  remainder  of  this  section 
were  obtained  by  operations  parallel  to  those  employed  in  the  development  of 
the  results  in  Section  5.   We  shall  therefore  omit  the  mathematical  detail  and 
state  only  the  final  results  and  their  interpretation. 


This  assumption  can  be  relaxed  by  the  prior  specification  of 
more  than  four  subgroups  in  the  population,  i.e.,  the  number  of  favorables 
who  respond  at  the  first  stage,  the  number  of  unfavorables  who  do  not 
respond  at  the  first  stage  but  do  respond  at  the  second,  etc.   The  analysis 
would  proceed  in  the  same  way,  but  would,  of  course,  involve  additional 
parameters. 
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The  second  stage  posterior  mass  function  is 
P'"(R^-r,R2-q,Sj^-k+r|r^k^q,n,m,r{,r2,s{,n',N) 

/^r{+Ri-l  ]  /r2+R2-l^  /s{+S^-l  \/n '-r{-r2'-s{+N-R^-R2-Sj^- A  |^  N-Rj^-Sj^-m' 


\  Rj-r  /  V  R2-q  /  V^l"'^"'"^/\     N-Rj-R2-Sj^-m-l-q    /  \   n-k-m  ^ 

/n '+N- 1\  /n  '-r {-s {+n-k- 1  \ 
I  N-n  /  I  n ''-rj'-s{+m-l  / 

(8.3) 
We  derived  (8.3)  by  dividing  (8.2)  by  the  predictive  probability 
of  q  favorables  in  a  sample  of  m: 

P"(qjm,r{,r2,s{,n',N)  =  f^^^^l™^  ^^2^^^ '"^l-^l)  •        ^^•^'> 

From  the  posterior  mass  function  in  (8.3)  we  obtain  by  summation 
over  R2-q 

P''(Rl-r,Si-k+r)  =  fQ^(Ri-r,Si-k+r|r{+r,s{+k-r,n'-hi,N-n)  .     (g.S) 

Remark  that  (8.5)  is  exactly  the  same  as  the  first  stage  posterior 
distribution  for  (R-|^-r,  S^-k+r)  given  by  (5.13).   In  other  words,  the  follow- 
up  sample  carries  no  additional  information  about  the  respondents  since 
second  stage  sampling  was  restricted  to  non-respondents  only.   It  follows 
that  the  second  stage  posterior  expectation  of  Ri-r  is  the  same  as  in  (5.14). 

For  the  second  stage  posterior  marginal  distribution  of  II2  -q  .re  hdve 

n-k-m 
I<"(R2-q|r,k,q,n,m,r{,r2',s{,n',N)  =  ^      Uj  fab(R2-q-j)  ^2'-^+J^'^ '+^^N-n), 

j=0  (8.6) 

where  uj  =  f^b(  J  |r2'+q,n '-r  {-s^+m)  .  (8^7) 
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Similar  to  the  interpretation  of  (5.29),  we  can  view  (8.6)  as  a 
weighted  average  of  Beta-binomial  probabilities,  each  corresponding  to  the 
marginal  posterior  probability  resulting  from  a  different  number  of  favorables 
in  the  n-k-m  remaining  non-respondents.   The  weight,  u j ,  is  also  Beta-binomial, 
the  predictive  probability  of  j  favorable  non-respondents  in  a  sample  of 
n-k-m  after  the  follow-up  sample  of  m  has  been  observed. 

Finally,  the  second  stage  posterior  expected  value  of  R2-q  is 

given  by 

(N-n)   r   ,       (n-k-m) (r2+q)  1     (n-k-m) (r2+q) 


j„  (N-n)   r   ,       (n-k-m)  (r2+<l)-l     (n-k-m)  (r2+ 

^  (R2-q)  =   T^— T  U^2+q)  +      ,  .^  .r^.        +      ,  .^  ^\ 

(n  +n)  |_  '-  (r2+S2+ni)   J       (r2+S2+m) 


J 


(8.8) 
where  sj  -   n-r^-r^-s^, 


•  •   •   • 


Observe  that  (8.8)  is  of  the  same  general  form  as  (6.15),  except 
that  q  favorable  non-respondents  have  now  been  observed,  and  the  problem 
is  to  estimate  the  number  of  remaining  non-respondents  that  are  favorable. 
The  Beta  parameters  for  favorable  non-response  have  been  updated  to  r2+q 
and  r2+S2+m;  ^nd  the  second  stage  prior  expected  number  of  favorables  in 
the  remaining  n-k-m  is  given  by  the  last  term  of  (8.8), 

(n-k-m)  (r2'+q) 

(r2''+S2'+m)      •  (8.9) 

Hence  (8.9)  plays  the  role  of  the  additional  t  favorables  that  one 
would  expect  to  observe  based  on  his  second-stage  predictive  distribution  for 
favorability  among  the  non-respondents. 

Expressions  (8.3)  through  (8.9)  are  the  principal  second-stage 
posterior  results  that  are  required  for  subsequent  developments.   The  reader 
should  note  that  when  m,  and  therefore  q,  are  equal  to  zero,  corresponding  to 
the  decision  to  omit  follow-up  sampling,  (8.3)  through  (8.9)  are  reduced  to 
the  corresponding  results  in  Section  5. 
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9,   OPTIMAL  SECOND  STAGE  SAMPLING 
We  now  extend  the  definition  of  the  loss  function  in  (6.1)  to 
include  the  cost  of  the  follow-up  sample 

L(R,R,r,k,q,n,m)  =  kj.(R-R)^  +  ^s  ^  ^s"  ^   ^f"        (9.1) 

where  k^  is  the  incremental  cost  of  the  m  follow-up  observations  and  the 
other  arguments  are  defined  as  before. 

With  the  quadratic  loss  function  in  (9.1)^  the  "best"  estimator  of 
R  =  R-j^  +R2  is  still  the  mean  of  the  posterior  distribution  of  R,  but  now  we 
require  the  second-stage  posterior  distribution.   Hence 

R*  =  E'"(R^+R2)  =  E"(Ri)  +  e'"(R2)   ,  (9.2) 

obtained  from  (5.14)  and  (8.8).   The  expected  loss  associated  with  the 
estimator  in  (9.2)  is  given  by 

minEr'l(R,R,r,k,q,n,m)  =  kj.Var'"(R)  +  Kg  +  kgn  +  k^m,       (9.3) 
R 

similar  to  (6.4). 

The  next  step  is  to  consider  the  problem  of  choosing  m  out  of  n-k 

at  the  end  of  the  first  stage,  where  k  respondents  have  been  recorded.   We 

want  to  pick  m  in  a  way  that  minimizes  the  expected  value  of  (9.3),  i.e., 

min  E"  min  Er'l(R,R,r,k,q,n,m)  =  min  kj.  E"Var"  (R)  +  Kg  +  kgn  +  k^m. 
">      R  "^  (9.4) 

Note  that  n  is  given,  having  somehow  been  determined  at  the  first 
stage.   In  fact,  we  can  think  of  the  choice  of  m  as  being  part  of  a  new  decision 
problem,  all  of  the  information  from  the  first  stage  having  been  incorporated 
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in  the  first  stage  joint  posterior  distribution  and  its  parameters  which 
form  the  starting  point  for  the  follow-up  analysis. 

The  determination  of  the  optimal  value  of  m  proceeds  as  in  Section 
7.   We  first  obtain  E"Var'"(R)  via  the  relationship  in  (7.2): 


E"Var"*^(R)  =  Var"(R)  -  Var"E"tR) 


(9.5) 


Since  Var'(R)j  the  first  stage  posterior  variance,  does  not  involve 
m,  we  need  only  minimize  the  expression 


kfm  -  k^Var"E"^(R) 


(9.6) 


Explicit  formulation  of  ^^"(R)  using  (5.14)  and  (8.8),  and  the 
treatment  of  q  as  the  relevant  random  variable  with  distribution  given  by 
(8.4),  enables  one  to  find 


^.2 


Var'r"(R)  =  (N+nO  (n  ^-r^^-s^^+n-k)  mr^^(n  ^-r^^-s{-rp 

(n  '+n)^(n  '-l-m-r^'-s{)  (n  ''-r{'-s{)2(n  '-r{-s{+l) 


(9.7) 


where 


Bm 


■? 7 7- 


(n  -r2^-3j0(n  -r^^-s^^+m) 


(N-hi')^(n'-r^'-s^'+n-k)^ 


(n'4Ti)2(n'-r'-s'+l) 
L  1   1 


*/     '      '   '  '\ 


(n  -r^-s^) 


(9.8) 


(9.9) 


We  can  substitute  (9.7)  into  (9.6)  and  differentiate  with  respect 
to  m,  but  because  m  is  restricted  to  the  range  [0,n-k],  we  must  also  examine 
the  sign  of  the  derivative  at  the  end  points  of  the  interval.   In  summarizing 
our  results  we  borrow  a  useful  approach  from  Ericson  [  7      ],  who  encounters  a 
similar  problem. 


28 


Define 


t     -  (n  -rj^-s^) 


■^f 


TT  =  -• — ^ ■  (9.10) 

(n-k) 


From  the  optimization  procedures  we  obtain  the  following  decision 
rule  for  the  choice  of  the  optimal  value  of  m: 

0   if   V<:^   0 
m'"  =  <     n-k   if  7/^1 

(n-k)   if       0<:  7T<  1  (9.11) 

We  can  thus  call  77"  the  optimal  proportion  of  the  n-k  non-respondents 
to  be  sampled  at  the  second  stage. 

In  order  to  show  some  of  the  aspects  of  the  behavior  of  //  as  a 
function  of  the  specified  parameters  and  the  sample  results^  we  present 
Tables  1,  1,    and  3.^  The  entries  in  the  first  row  at  the  top  of  each  table 
are  the  assumed  values  of  n' ,    N^  n^  and  ^,    where 

^-^  ,  (9.12) 


kf 


the  ratio  of  the  cost  associated  with  estimation  errorand  the  incremental 
cost  of  follow-up  sampling. 

Each  column  of  the  interior  of  the  tables  corresponds  to  an 
assumed  value  of  r2'/(n '-r  {-s p ,  which  we  can  call  the  prior  favorability 
ratio  among  non-respondents.   This  ratio  is  incremented  in  units  of  one-tenth 
up  to  the  value  of  0.5. 

The  entries  in  the  tables  were  computed  and  printed  with  JOSS,  the 
on-line  computer  of  the  RAND  Corporation,  Santa  Monica,  California. 
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TABLE    1 
Selected    V.iUies    of      77" 

i»,00  Ns  100  n»  10  Q"        ,1000 


r^/Cn'. 

.0 

•  1 

.2 

.3 

.«♦ 

.5 

(n-k)/n«, 

.1 

(tj'+sp/n' 

.1 

-3.600 

-.732 

,224 

.781 

1.083 

1.180 

.3 

-2.800 

-.501 

,265 

.711 

.954 

1.031 

.5 

-2,000 

-.274 

,302 

.637 

.819 

.877 

.7 

-1.200 

-.055 

.327 

.549 

.670 

,708 

.9 

-.1*00 
(n-k)/n= 

.127 
.3 

.303 

.406 

.461 

,479 

(rj['+sp/n  ' 

.1 

-1,200 

.172 

.629 

,895 

1.040 

1,086 

.3 

-.933 

.236 

.626 

,853 

.977 

1,016 

.5 

-.667 

,292 

.612 

.798 

.899 

,932 

.7 

-.UOO 

.329 

.572 

.713 

.790 

,814 

.9 

-.133 
(n-k)/n= 

,294 
.5 

.436 

.519 

.564 

,578 

(r^'+sp/n' 

.1 

-.720 

,352 

.710 

.918 

1.031 

1.067 

.3 

-.560 

,384 

.698 

.882 

.981 

1.013 

.5 

-.UOO 

,406 

,674 

.831 

.916 

.943 

.7 

-.240 

,405 

.621 

,746 

.814 

.836 

.9 

-.080 
(n-k)/n« 

,327 
.7 

.462 

,541 

,584 

.598 

(r^%sp/n' 

.1 

-.514 

,430 

.744 

.928 

1.027 

1.059 

.3 

-,U00 

,447 

.729 

.894 

.983 

1.012 

.5 

-.286 

,454 

.701 

.844 

.922 

.947 

.7 

-.171 

.438 

.642 

,760 

.824 

.845 

•9 

-.057 
(n-k)/n= 

.341 
.9 

.474 

,551 

.593 

,607 

(r^'+^p/n- 

.1 

-.400 

.473 

.764 

,933 

1.025 

1.055 

.3 

-.311 

.482 

.746. 

.900 

,984 

1.011 

.5 

-.222 

.481 

.715 

.852 

.926 

.950 

.7 

-.133 

.457 

.     .653 

.768 

.830 

.850 

.9 

-.044 

,349 

.480 

.557 

.598 

.611 
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TABLE  2 
Selected  Values  of  TT 


U.OO 


N= 


10000 


na 


1000 


,1000 


r2/(n'- 

-ri-r2) 

.0 

.1 

.2 

.3 

.«* 

.5 

(n-k)/n=4 

,1 

(r^'+sp/n' 

.1 

-.036 

,830 

1.119 

1,287 

1,379 

1.408 

.3 

-.028 

.806 

1.084 

1.246 

1.334 

1.362 

.5 

-.020 

.767 

1.030 

1.183 

1.266 

1.292 

.7 

-.012 

.695 

.930 

1.067 

1.142 

1.166 

.9 

-.004 
(n-k)/n», 

,503 
.3 

,672 

,771 

,824 

.841 

(r'+s')/n' 

1      1 

.1 

-.012 

.834 

1,116 

1,281 

1,370 

1.398 

.3 

-.009 

,810 

1.083 

1.242 

1,328 

1.356 

.5 

-.007 

,770 

1,029 

1.180 

1,262 

1.288 

.7 

-.OOU 

,697 

.931 

1.067 

1,141 

1.164 

.9 

-.001 
(n-k)/n» 

.505 
.5 

.673 

,772     ' 

,825 

.842 

(r^'+sp/n' 

.1 

-.007 

,835 

1.116 

1,279 

1,368 

1.397 

.3 

-.006 

.810 

1.082 

1.241 

1,327 

1.354 

.5 

-.004 

.771 

1.029 

1.180 

1.261 

1.288 

.7 

-.002 

.697 

.931 

1.067 

1.140 

1.164 

.9 

-.001 
(n-k)/n= 

,505 
.7 

.673 

.772 

.825 

.842 

(rVsp/n' 

.1 

-.005 

.835 

1.116 

1.279 

1.367 

1.396 

.3 

-.004 

.811 

1.082 

1.240 

1.326 

1.354 

.5 

-.003 

.771 

1.029 

1.179 

1.261 

1.287 

.7 

-.002 

,698 

,931 

1,067 

1.140 

1.164 

.9 

-.001 
(n-k)/n= 

,505 
.9 

,674 

.772 

,825 

.842 

(r^^+sp/n' 

.1 

-.004 

,836 

1.115 

1.278 

1,367 

1,395 

.3 

-.003 

,811 

1.082 

1.240 

1.326 

1.353 

.5 

-.002 

,771 

1.029 

1,179 

,     1.261 

1.287 

.7 

-.001 

,698 

.     .931 

1,067 

1.140 

1.164 

.9 

.000 

,505 

,674 

,772  ■ 

.825 

.842 
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TABLE   3 
Selected   Values   of     TT 


n'=      50.00000000  N= 


10000 


n= 


1000 


e= 


.1000 


r^/in^- 

.0 

.1 

.2 

.3 

.«♦ 

.5 

(n-k)/n= 

.1 

(r^'+sp/n' 

.1 

-.450 

.852 

1.286 

1.539 

1.677 

1.720 

.3 

-.350 

.R59 

1.262 

1.U96 

1.62U 

1.664 

.5 

-.250 

.863 

1.23^ 

1.450 

1.568 

1.605 

.7 

-.150 

.861 

1.198 

1.394 

1.501 

1.535 

.9 

-.050 
(n-k)/n= 

.820 

.3 

1.110 

1.279 

1.371 

1.401 

(r^sp/n' 

.1 

-.150 

.883 

1.227 

1.U28 

1.537 

1.571 

.3 

-.117 

.883 

1.216 

l.Ull 

1.516 

1.550 

.5 

-.003 

.881 

1.203 

1.390 

1.492 

1.524 

.7 

-.050 

.873 

1.181 

1.360 

1.458 

1.489 

.9 

-.017 
(n-k)/n= 

.026 
.5 

1.107 

1.271 

1.359 

1.388 

(r;+s;)/n' 

1    I 

.1 

-.090 

.889 

1.215 

1.U05 

1.509 

1.542 

.3 

-.070 

.880 

1.207 

1.393 

1.494 

1.527 

.5 

-.050 

.805 

1.197 

1.378 

1.477 

1.500 

.7 

-.030 

.876 

1.177 

1.353 

1.449 

1.479 

.9 

-.010 
(n-k)/n= 

.827 
.7 

1.106 

1.269 

1.357 

1.385 

(r^'+sp/n- 

■" 

.1 

-.06U 

.892 

1.210 

1.396 

1.497 

1.529 

.3 

-.050 

.890 

1.203 

1.386 

1.485 

1.517 

.5 

-.036 

.886 

1.194 

1.373 

1.470 

1.501 

.7    ^ 

-.021 

.877 

1.176 

1.350 

1.445 

1.475 

.9 

-.007 
(n-k)/n= 

.828 
.9 

1.106 

1.268 

.    1.356 

1.384 

(r^'+sp/n' 

.1 

-.050 

.893    • 

1.207 

1.390 

1.490 

1.522 

.3 

-.039 

.891 

1.201 

1.382 

-,1.480 

1.511 

.5 

-.020 

.887 

1'.192 

1.370 

1.467 

1.497 

.7 

-.017 

.877 

1.175 

1.3U9 

1.443 

1.473 

.9 

-.006 

.828 

1.106 

1.268 

1.356 

1,384 
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The  successive  blocks  of  five  rows  correspond  to  increasing  values 
of  (n-k)/ n^  the  sample  non-response  ratio.   Finally,  within  each  block  we 
let  the  rows  correspond  to  increasing  values  of  (r^+S]^)'n',  the  prior  proba- 
bility of  response. 

Examination  of  each  of  the  three  tables  shows  the  following: 

1.  As  ro /'(n '-r^^-sp  increases  to  0.5,  77"  increases.   (The  zero  column  was 
included  as  a  check  on  the  computations.)  The  second  factor  of  B  in  (9.9) 
shows  that  when  r^/ (n '-r-T-Si')  is  greater  than  0.5  there  is  a  symmetric 
decrease  in  77  as  it  approaches  1.0.   This  says  that  the  closer  to  0.5 

we  judge  the  probability  of  a  favorable  attitude  among  non-respondents  to  be 
a  priori,  the  more  there  is  to  gain  from  a  follow-up  sample,  other  things 
equal. 

2.  As  ( r  •j'+s -j^) '  n '  increases,  //  decreases;  i.e.,  the  greater  the  prior 
probability  of  response  the  less  there  is  to  gain  from  additional  sampling. 

3.  As  (n-k)/ n  increases,  TT~   increases.   Observe,  however,  that  when  N  and 
n  are  large  the  effect  is  negligible,  and  even  in  Table  1,  a  great  change  in 
(n-ky  n  is  required  to  make  a  difference  in  one  observation  in  the  value  of 

rr. 

Although  not  explicitly  shown  in  the  tables  selected  for  illustra- 
tion here,  it  is  clear  from  the  examination  of  (9.10)  that  as  ^  =  k^-/k£ 
increases,  //  increases,  as  we  would  expect.   Furthermore,  holding  other 
values  fixed,  an  increase  in  n  results  in  a  decrease  in  /  /  .   (This  makes 
sense,  since  for  a  given  value  of  7/  ,  m  is  larger.) 

Next,  comparing  Tables  1  and  2,  we  see  that  holding  the  ratio  n/N 
fixed,  an  increase  in  N  results  in  an  increase  in  //  .   (The  increase  would 
be  even  greater  if  n  remained  at  10.)   It  appears  that  the  increase  in  the 
marginal  variances  of  the  joint  prior  distribution  brought  about  by  increasing 
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N,  makes  follow-up  sampling  more  attractive  as  a  means  of  reducing  uncertainty. 
Finally,  a  comparison  of  Tables  2  and  3  shows  that  when  n'  increases, 
//  is  increased  throughout.   This  phenomenon  is  puzzling  because  an  increase 
in  n '  implies  a  "tighter"  prior  distribution  for  (R-|^,R2,Sj^),  and  the  effect 
on  7 /     is  the  opposite  of  that  which  results  when  prior  uncertainty  is  re- 
duced by  decreasing  N.   (See  paragraph  above.)   Conversely,  if  we  decrease 
n'  sufficiently,  i.e.,  make  the  prior  diffuse  enough,  we  can  drive  '/  to 
zero.   We  conclude  that  if  uncertainty  is  due  to  the  large  size  of  the  popu- 
lation, then  a  follow-up  sample  of  a  given  size  is  of  greater  potential 
informative  value  in  terms  of  reducing  expected  loss  than  it  is  when  uncertainty 
is  due  to  the  subjective  choice  of  n'.   One  might  say  that  the  more  dogmatic 
our  subjective  assessment,  the  more  we  are  willing  to  spend  in  order  to  find 
out  whether  we  are  right  or  wrong;  whereas  the  more  subjectively  diffuse  we 
are,  the  less  valuable  is  second  stage  sampling. 

10.   OPTIMAL  INITIAL  SAMPLE  SIZE  WITH  FOLLOW-UP 

The  determination  of  optimal  n  in  Section  7  did  not  take  into 

account  the  possibility  of  follow-up  sampling.   In  order  to  determine  n 

for  an  analysis  that  may  not  terminate  at  the  end  of  the  first  stage,  we 

must  minimize  the  first  stage  prior  expectation  of  the  second  stage  posterior 

expected  loss.   We  begin  by  considering  the  choice  of  the  optimal  value  of  m 

according  to  the  rules  of  Section  9.   Using  (9.4),  (9.5),  (9.8),  and  (9.9) 

we  write 

min  k^E"Var'*"(R)  +  Kg  +  k^n  +  kfm 
m 

,^  B(k)m^(k) __^ 

=  k,  Var'-(R)  =   (n''-r;-s;)(n^-rr-s/-hn"(k))  +  ^^s  +  ^^s"  +  H^^^^    ' 

(10.1) 
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where  m  (k)  is  the  optimal  value  of  m,  determined  by  examination  of    (9.10), 
and  a  function  of  k,  the  number  of  respondents  in  the  sample.   Note  that  B, 
defined  in  (9.9),  also  depends  on  k,  as  indicated  by  the  parentheses. 

Now,  the  problem  of  selecting  the  optimal  initial  sample  size 
n  involves  finding 


min  E 


r.k 


Var"(R)  --—, -, 7 


B(k)m  (k) 


/  '   '  '\  /     '       '       '^L—*/!  w   +K.  +k  n+k^m  (k) 
(n  -r]^-sj^)(n  -r-i^-sj^+m  (k))     s   s    f 


(10.2) 
Note  that  we  have  written  the  prior  expectation  operator  in 
(10.2)  with  a  subscript  r,k  in  order  to  remind  the  reader  that  in  the  first 
stage,  r  and  k  are  the  relevant  random  variables.   Finally,  according  to 
(7.2),  we  write 


m 


in  k^  J  Var '(R)-Var'E"(R)-E^'  j^ 


B(k)m*(k) 


'     I  (n  '-r/-s/)  (n  '-r/-s  |''+ra*(k) 


+Kg+kgn+kf  E^'  i^m"(k) 


(10.3) 

and  we  can  advance  no  further.   We  know  Var'E"(R)  from  (7.8),  but  the  re- 
maining two  expectations  of  the  functions  of  m*(k)  are  very  difficult,  if 
not  impossible,  to  evaluate  analytically.   We  leave  the  reader  with  this 
general  approach  to  the  problem  and  the  suggestion  that  numerical  methods 
can  be  used  to  carry  it  to  completion. 
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